ESub: Exploration of Subgraphs. A tool for exploring models generated by Graph Mining algorithms
نویسندگان
چکیده
In this demo we introduce ESub, a tool aimed at visualizing the outcome provided by a frequent subgraph mining algorithm, i.e. SUBDUE. Such a tool has been developed as a supporting tool for a methodology we proposed in previous works for analyzing unstructured processes, based on the use of graphs. By exploiting graphs-based techniques, it is possible to provide the user with a different perspective on a process, where only the most relevant subprocesses (i.e., subgraphs) are displayed, rather than the complete, end-to-end process schema, which often results very chaotic in unstructured domains. Our tool allows the user to visualize and interact with such subgraphs. Furthermore, it allows for visualizing the original graphs of the set, and compress them by means of the most relevant subgraphs, in order to obtain a simplified view of the overall process. 1 SubProcesses Analysis Process Mining (PM) methods are aimed at extracting from an event log a process schema describing the flow of the performed activities [1]. However, when applied to the so-called “Spaghetti Processes", i.e. processes with little or no structure, classical PM techniques usually generate a very chaotic model, usually not understandable for a human analyst. As a remedy, in previous works [2] we proposed an alternative methodology for analyzing spaghetti processes, aimed at extracting their most relevant subprocesses. Such approach exploits a graphbased technique; more precisely, it requires to transform the event log of the process in a set of graphs, each of them describing the execution of a certain process instance, then extracting the subprocesses from these graphs. In this paper, we present ESub, a tool we developed to manage the set of subprocesses extracted from a spaghetti process, represented as subgraphs. ESub Copyright c ©2015 for this paper by its authors. Copying permitted for private and academic purposes is designed to offer advanced functionalities for the subprocesses visualization and analysys. Furthermore, it also provides the user with a flexible mechanism to simplify the overall process visualization by exploiting a compression mechanism, i.e. by replacing each subgraph with single nodes. The user can set the desired level of compression. The current implementation of our tool is aimed to manage the set of subgraphs extracted by the SUBDUE algorithm [4], that is a hierarchical clustering algorithm. Anyway, it is easy to build adapters to apply our tool also to results obtained from other FSM algorithm. The outcome provided by SUBDUE is a hierarchy, where the top level subgraphs are built by using only elements of the original graphs set (i.e., nodes and edges), while lower level subgraphs involve the higher level subgraphs in their definition. Typically, the top-level subgraphs represent the most relevant ones. SUBDUE also labels each subgraph on the basis of the order in which they have been extracted. We discussed in [3] a set of measures to evaluate the SUBDUE subgraphs; currently, two of them are taken into account by ESub, i.e. the frequency (FREQ), which evaluates the number of occurrences of a subgraph in a graphs set, and the representativeness (REP), which evaluates the percentage of graphs in which a subgraph occurred at least once. Our tool is implemented as a web application. The most of the available functionalities can be called from a left side menu, organized in a set of tabs, as shown in Figure 1; the user can expand each tab by clicking on its label. The following subsections describe the two main groups of functionalities offered by the tool, i.e. the visualization and the compression of subgraphs. 1.1 SubProcesses Visualization There are two main kinds of visualization functionalities, i.e. the a)navigation, and the b)filtering of subgraphs. As regards group a), first the user has to use the load tab to upload the file of the subgraphs to visualize. From the load tab, the user can chose if upload the outcome returned by SUBDUE, that is a SUBS file, or a more generic DOT file, depending on which outcome the user has at disposal. It is also possible, although not mandatory, to load a LOG file, that stores the FREQ and REP values for the subgraphs, which can be exploited during the subgraphs exploration. After the upload, each subgraph is displayed as a single node, with the label assigned to it by SUBDUE. Such compact representation provides an overview of the complete hierarchy extracted by SUBDUE. The user can, anyway, expand a node by selecting it and then using the expand tab, or by simply doubleclicking the node. Similarly, it is possible to compress an expanded node by selecting it and using the compress tab or by double-clicking the node. The expand/compress tabs allow the user also to expand/compress several nodes (possibly, the entire hierarchy) at the same time. Figure 1 shows the uploaded subgraphs set, with the nodes SUB2 and SUB31 expanded. Note that SUB2 is a “parent" of SUB31, since SUB31 involves SUB2, as we can see in the figure. Using the tab Layout, it is possible to adjust the visualization according to the user’s needs; in particular, it is possible to increase/decrease both the width and the height of the displayed outcome. It is also possible to enable/disable the movement of the nodes. Note that there is also a right-side menu, which displays the list of the subgraphs that are currently compressed/expanded; by clicking the name of a compressed/expanded subgraph, the corresponding node in the hierarchy is surrounded with a red square. Fig. 1: Uploaded subgraphs set, with two subgraphs expanded Functionalities of group b) aim at supporting the users in easily detecting the most interesting subgraphs. It is unrealistic that the user can analyze each single subgraph, since they can be hundreds. Hence, the tool implements some functionalities aimed to support the search, or the filtering, of specific subgraphs. The first, and the most simple one, is the search tab, which allows the user to search the subgraphs by using user’s keywords. The other search functionality is provided by the filter tab, which allows for detecting the subgraphs which correspond to certain values of FREQ and/or REP. It is also possible to use the filter functionality without changing the default parameters setting; in this case, it will list all the top level subgraphs, reporting their values of FREQ and REP. It is interesting to note that the filtering functionality takes into account only the top-level subgraphs; in fact, since they are assumed to be the most relevant ones, the FREQ and REP values are computed only for them. We would like to point out that the tool also allows the user to export either the entire hierarchy or a set of selected subgraphs. More precisely, by using the export functionality it is possible to export the visualized hierarchy either in a SVG or in a DOT format; the corresponding file is automatically downloaded. Furthermore, it is also possible to select a set of subgraphs and visualize them in another web page; this is extremely useful when only a subset of the hierarchy has to be analyzed. Note that in the new page, by selecting again the export tab, the tool generates the SVG or DOT file corresponding only to the exported portion of the hierarchy. Fig. 2: Filtering Tab
منابع مشابه
Arabesque: A System for Distributed Graph Mining - Extended version
Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very la...
متن کاملCSV: Visualizing and Mining Cohesive Subgraphs
Extracting dense sub-components from graphs efficiently is an important objective in a wide range of application domains ranging from social network analysis to biological network analysis, from the World Wide Web to stock market analysis. Motivated by this need recently we have seen several new algorithms to tackle this problem based on the (frequent) pattern mining paradigm. A limitation of m...
متن کاملMining Frequent Paerns in Evolving Graphs
Given a labeled graph, the frequent-subgraph mining (FSM) problem asks to nd all the k-vertex subgraphs that appear with frequency greater than a given threshold. FSM has numerous applications ranging from biology to network science, as it provides a compact summary of the characteristics of the graph. However, the task is challenging, even more so for evolving graphs due to the streaming natu...
متن کاملClustering Improves the Exploration of Graph Mining Results
Mining frequent subgraphs is an area of research where we have a given set of graphs, and where we search for (connected) subgraphs contained in many of these graphs. Each graph can be seen as a transaction, or as a molecule — as the techniques applied in this paper are used in (bio)chemical analysis. In this work we will discuss an application that enables the user to further explore the resul...
متن کاملA survey of frequent subgraph mining algorithms
Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015